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Abstract —We propose an algorithm for distributed charging 
control of electric vehicles (EVs) using online learning and 
online convex optimization. Many distributed charging con¬ 
trol algorithms in the literature implicitly assume fast two- 
way communication between a distribution company and EV 
customers. This assumption is impractical at present and raises 
privacy and security concerns. Our algorithm does not use this 
assumption; however, at the expense of slower convergence to 
the optimal solution. The proposed algorithm requires one-way 
communication, which is implemented through the distribution 
company publishing the pricing profiles of the previous days. We 
provide convergence results of the algorithm and illustrate the 
results through numerical examples. 

Index Terms —Charging control, demand response, electric 
vehicles, online convex optimization, online learning, regret min¬ 
imization 

I. Introduction 

Demand response (DR) is an important functionality for the 
next generation power systems. It empowers the distribution 
company and its customers to decide collectively, but in a 
distributed manner, the best way to schedule energy usage. 
The reader is referred to ED, CD, ED, and references therein 
for the details of DR. This paper focuses on the flexible 
load capability offered by electric vehicles (EVs) owned by 
residential customers. 

Large-scale integration of EVs imposes a significant burden 
on the grid. Particularly, creation of new peaks, peak load 
amplification ESI and voltage deviations El among other 
effects have been identified as major concerns. To cope with 
these issues, many algorithms have been proposed to schedule 
the charging of EVs, e.g., Q, H, l20l , lETI . |26l . We are 
particularly interested in the algorithms such as those proposed 
in @, El, ED, El, which lead to the convergence of the 
total load profile to a desired one (for instance, a valley-filling 
profile) through appropriate price signals transmitted to the 
owners. 

Many of the existing algorithms have analytical convergence 
guarantees and do not require the customers to share their 
charging constraints with the distribution company. On the 
other hand, they require a series of messages to be exchanged 
among the distribution company and the customers regard¬ 
ing possible price profiles and desired charging profiles in 
response. As the available power supply and the customer 
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requirements for charging their EVs change from day to day, 
these messages need to be exchanged daily to calculate the 
charging profiles. Since these message exchanges need to be 
completed before the EVs can begin charging, the algorithms, 
thus, implicitly assume the presence of a communication 
infrastructure and protocols that can support low-latency two- 
way communication between the distribution company and the 
EV customers. Such infrastructure and protocols have not yet 
been deployed extensively El- Furthermore, the transmitted 
data carry information about the constraints faced by the 
individual customer and raises privacy and security concerns. 

Motivated by this issue, we propose an online learning 
and online convex optimization based distributed charging 
control algorithm. This algorithm requires only one-way com¬ 
munication from the distribution company to the customers. 
Furthermore, the communication carries information about the 
pricing profiles of previous days. 

In our formulation, we model the distribution company and 
every EV customer as decision makers who wish to optimize 
their own utility functions. For the distribution company, the 
payoff is maximized if the total load profile over a day 
is valley-filling 0, ESI- For the EV customer, the utility 
function is maximized if the cost to charge the EV over a 
day is minimized. By designing a suitable pricing policy, 
the distribution company aims at ensuring that the charging 
profiles followed by the customers aggregate to a valley¬ 
filling profile. Our distributed charging control algorithm is 
based on an online learning and online convex optimization 
framework. The only communication that occurs is when the 
distribution company notifies every customer of the pricing 
profiles incurred over the previous days. The online learning 
framework has tremendous popularity in the online convex 
optimization and machine learning community (see e.g., 0, 
0, E3, El, E3, and references therein). We use a regret 
minimization algorithm l23l in the online learning framework. 
The regret minimization algorithm uses a regret as the per¬ 
formance measure and provides an iterative way for every 
decision maker to update its policy such that, at convergence, 
the policy is optimal in a suitably defined sense. Informally, the 
regret minimization algorithm operates as follows. Consider a 
situation in which multiple decision makers need to design 
their own individual utility functions while satisfying coupled 
constraints. Furthermore, the utility functions of the decision 
makers may also be coupled through the actions of multiple 
decision makers and also the environment. The environment 
consists of the factors that the decision makers cannot control. 
The environment is uncertain and time-varying. Decisions are 
made repeatedly and the resulting payoffs are used to improve 
the decision policies used by the decision makers. Thus, in 
every iteration, every decision maker makes a decision given 
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the current policy and the realization of the environment. 
Given the decisions of the decision makers and the realization 
of the environment, the various decision makers obtain certain 
payoffs. 

Our contributions are two-fold. First, we present a regret 
minimization based distributed algorithm for charging control 
of EVs that requires only one-way communication. Second, 
we allow heterogeneous utility functions for the various EV 
owners, as would be the case when customers vary in the 
elasticity of shifting their loads in response to prices. 

Some relevant references that apply regret minimizetion to 
DR are lfl4l . Ifl6l . (25). In fl6l . real-time electricity pricing 
strategies for DR are designed using regret minimization. 
However, the focus of that work is on optimizing the util¬ 
ity function for the distribution company, and the customer 
behavior is assumed to be such that the load change is linear 
in the price variation. The objective of ns is to design pricing 
policies for the customers having price responsive loads. The 
exact demand function of the customer is assumed to be 
unknown to the pricing policy maker. In fl4l . the distribution 
company is the only decision maker, whereas in our work, the 
distribution company and the EV customers are all decision 
makers. In |25| , regret minimization is used to learn the charg¬ 
ing behavior of the EV customers. The price responsiveness for 
a community of customers is captured through a conditional 
random field model. The regret minimization algorithm is 
adopted to learn the parameters of the model. 

The remainder of the paper is organized as follows. Sec¬ 
tion |II] presents the problem formulation. The basic framework 
and main results are presented in Section [III] Section |IV] 
extends the basic framework to some practical charging sce¬ 
narios. Some numerical examples can be found in Section [V] 
Section [VI] concludes the paper. 

II. Problem Formulation 

We consider a scenario in which N customers schedule the 
charging of their electric vehicles (EVs) daily. The charging 
needs to be completed over a day. Let there be T time slots in 
a day and denote the set of these time slots by T :={1,...,T}. 
Denote the set of EVs by Af := {1,..., N}. Denote the base 
load on day fc by D k (t) el, teT. We assume that this base 
load is unknown to the EV customers and to the distribution 
company at the beginning of the fc-th day when the charging 
schedules are fixed. Furthermore, the base load may vary from 
day to day. Denote by x k ( t ) £ R the charging rate of the i-th 
EV in the t-th time slot on the fc-th day. The charging profile 
of the ?'-th EV on the fc-th day is denoted by a vector x k := 
[x k (1), x k (2),.... asf(T)). The aggregated charging profile of 
the EV customers is described by a vector x k := ( x k ,..., x k N ). 
Let x l ° w (t ) and x^ p (t) denote the minimum and maximum 
charging rates for the /-th EV in the f-th time slot and S, 
denote the desired total charge for the i-th EV at the end of 
the fc-th day; thus Si = EteT x id)> for every fc. The total 
load as seen by the distribution company is the sum of the 
base load and the charging rates adopted by the EVs. 

The objective of the distribution company is to achieve a 
total load profile that is valley-filling while ensuring that both 


the base load and the EVs are supplied with the required 
amount of energy. The base load is inflexible, while the EV 
charging profile may be shaped as long as the desired amount 
of power is provided by the end of the day. The goal of 
the distribution company can be described as obtaining the 
aggregated charging profile x k , for every fc £ N>o, that solves 

minimize c k (x k ) 

subject to x- ow (t) < x k (t) < a;^ p (f), t £ T, i £ Af, (1) 
EteT x i (*) = S i, i^Af, 

where the distribution company cost function c k is chosen as 

(£*(*) + !>< (*)) 2 - < 2 > 

teT ieV 

The cost function d2} for the distribution company is also con¬ 
sidered in (9), iflOl . By solving (QJ, the distribution company 
can obtain a valley-filling total load profile while satisfying 
the charging requests of the customers. 

To incentivize the customers to choose charging profiles that 
in aggregate minimize the cost (O, the distribution company 
designs suitable pricing profiles for the power being supplied 
to the EVs. Every EV customer fixes the charging schedule 
at the beginning of the day based on the information about 
her own constraints and any information provided by the 
distribution company. A price-sensitive EV customer seeks 
to minimize the total cost of charging by suitably shaping 
her charging schedule. Thus, the optimization problem for 
each such customer i, i £ Af is to design a charging profile 
x k , fc £ N>o that solves 

minimize c k (x k , x k ^) 

subject to a;- ow (t) < x k (t) < x^ p (t), t £ T, 
Eter^id) = Si, 

where c k is a convex function in x k and is also a function of 
the other customers’s charging profiles, where x k ^ := x k for 
j £ A f, j A i- Since the pricing policy is possibly a function 
of the base load D k and other customer’s charging profiles, 
c k inherits these features as well. 

The information flow is as follows. The distribution com¬ 
pany monitors the total load and publishes the price profile 
for the previous day as realized according to a fixed and 
known pricing policy. The customers decide on the charging 
schedules for the next day with access to these pricing profiles 
for the previous days. No other communication occurs between 
the distribution company and the customers, or among the 
customers. 

III. Online Learning Framework 

We now adopt a regret minimization framework to solve 
both problem ([]} and problem ([3J. The regret minimization 
framework is used for online learning and optimization and 
only requires one-way message exchanges between the distri¬ 
bution company and the customers. 

Let L be a A-strongly convex function with respect to a 
given norm || • ||. Let I )/,(•, •) denote the Bregman divergence 
ll23 1 with respect to L. Let || • ||* denote the norm that is 
dual to || • ||. Let VL denote the gradient of L and V L ~ 1 


3 


denote the inverse mapping of V L. For example, if L is 
the squared Euclidean norm, i.e., L(-) = || • || 2 , then the 
corresponding Bregman divergence Dl(x,u) is equal to the 
squared Euclidean distance \\x — y || 2 , the inverse mapping 
VL _1 (x) is equal to ^x, and the dual norm is the Euclidean 
norm. 

Customer perspective: For the i-th EV customer, the 
decision variable is her charging profile x k on the fc-th day. 
The set of feasible charging profiles is given as 


Fi ■= 6 I x l r(t) < X«(t) < < p (t), 

ter, X) *?(*) = ^ l 
teT J 


We define x* as 


K 


* ■■= arg min ^c k (xi). 


(4) 


(5) 


fc=l 


In the regret minimization framework. The notion of regret 
is used to measure the performance of an online algorithm 
EH, E3- For customer i £ J\f, the customer regret after K 
days Ri is defined as the difference between the cumulative 
cost function value of the charging profiles x k , k = l,...,K 
generated by an online algorithm and the one generated by 
z*, i.e., 


K 


K 


Ri{K,x k i ) :=J2 c i( x i) - mi" 

z ' XiEJ-i ' 


k =1 


( 6 ) 


fc=l 


As we mentioned in Section [TT] c k is possibly a function 
of the base load D k and other customer’s charging profiles 
Xj^i, j £ A f, j ^ i. Since the base load and the charging 
profiles may change from day to day, c k may not remain the 
same from day to day a well. While x* remains the same from 
one day to the next, x* is a suboptimal solution of ((3}- The 
regret Ri measures the difference between the performance 
of the charging profile generated by an online algorithm and 
the performance that is obtained by the suboptimal charging 
profile x*. Notice that the suboptimal charging profile x* can 
only be calculated in hindsight after K days have elapsed. 

We adopt the optimistic mirror descent (OMD) algorithm 
[j23j to generate the charging profile update which minimizes 
the regret ©. On each day, the regret minimization algorithm 
generates the charging profile update without knowing the 
current objective function (and its gradient). Specifically, the 
OMD algorithm iteratively applies the updates 

h * +1 = VL-^VL^) - JfcVcft*?)) 

x k+1 = argmin pixf M k+1 + D Li (xi,h k+1 ), ^ 

Xi^Ti 


where r\i £ R is an algorithm parameter, h k is an intermediate 
update of the charging profile. For easy of presentation, for the 
vector h k £ R T , Li(h k ) is set to Li(h k ) = . The i-th 

customer may have a prediction M k for the gradient of the 
cost function c k . For example, on day k, for customer i £ AT, 
one possible option of the prediction M k is the average of the 


gradients of the cost functions for the previous days, namely, 
M t k {'4) = Efc=i,...fc -1 Vc i (*i )• Intuitively, the iteration 

(0 updates the charging profile toward the negative gradient 
direction and projects it onto the set of feasible charging 
profiles. 

As we mentioned in Section [TT] the customer and the distri¬ 
bution company have different objectives and hence different 
regrets. We now switch to the distribution company perspective 
and define the regret minimization framework for the company. 

Company perspective: For the distribution company, the 
decision variable is the aggregated charging profile x k on the 
k- th day. The set of the aggregated feasible charging profiles 
is denoted by T : = J| x x ... x Tm■ The distribution 
company’s regret after K days is given by 

K K 

R u (K,x k ) ^ c k (x k )-m \n^c k u {x). (8) 

k— 1 k —1 

We define x* as 


K 

x* := argmin c*(x). (9) 

k= 1 


The OMD algorithm generates the charging profile update 
which minimizes the regret © as 

h k u +1 = VL-\VL u (h k u ) - VuVc k {x k )), 

x k+1 = argmin -q u x T M k+1 + D Lu (x, h k+1 ), (10) 

x$LT 

where r\ u £ R is an algorithm parameter, h k is an intermediate 
update of the aggregated charging profile, and M k is the 
prediction of the gradient of the cost function V c k . For easy 
of presentation, for the vector h k £ R WT , Li(h k ) is set 
to L u (h k ) = ^'3 . As an example, the prediction M k 
can be chosen to be the average of the gradients of the 
cost functions for the previous days, namely, M k (x k ) = 

rrx J2k=i,...k-i ^ c u( xk )- 


A. Convergence Results 

The following results summarize the convergence of the 
charging profile updates generated by the OMD algorithm. 
All proofs can be found in the Appendix. 

Proposition III.l. ( Convergence of regret): For every x* £ 
Ti, the iteration 0 converges in the sense that 

, K 

Ri(K,x k ) <-P z + !±J2 HVcf(a£) - M k || 2 , (11) 

** 2 ti 

where 


K 


K 


Ri(K, x k ) := 53 c k (x k ) 53 c k {x*\ 


fc=l 


fc=l 


( 12 ) 


Pi := max LAxf) — min LAxf). 

Xi^Ti XiGFi 


In particular, if r]i is chosen as 0(1/s/K), then the average 
regret, i.e., Ri(K)/K, converges to zero as I\ —> oo. 
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Similarly, for every x* G J~, the iteration m converges in 
the sense that 

1 K 

Ru(K,x k )< -P u + ^J2\\Vc k u (x k )-M*\\l (13) 

^ 2 ti 

where 

R U (K, x k ) :=^c^(x fe ) 

fc=l 

P u := max L u (x) 

xGF 

In particular, if r/ u is chosen as 0{l/s/~K), then the average 
regret, i.e., R U (K)/K, converges to zero as K —> oo. 

Any online algorithm yields a sublinear regret bound as 
in £□> and (TTD is called a no regret algorithm ©, El- 
The results CD and CD guarantee that, as the number 
of days increases, the average performance of the charging 
profiles generated by the OMD algorithm approaches the 
performance that is obtained by the charging profiles x* and 
x*,i£ AT, respectively. Note that while the charging profiles 
x* and x*, % £ M may not solve the optimization problems 
© and ©. respectively, these solutions are optimal for the 
related problems © and ©, respectively. In Section flV-Al we 
compare the performance of the charging profile generated by 
the OMD algorithm with the performance that is obtained by 
the solutions of problems © and ©. However, in that case, 
the convergence guarantees that are obtained are weaker. 

B. Design of the Pricing Function 

There are no guarantees that the solutions x*, i £ Jf of 
the problem © can solve the problem ©. In fact, unless 
the pricing function c k is carefully designed, these solutions 
will not be the same since the objectives of the distribution 
company and the EV customers are different. After some 
algebraic manipulation of the updates © and (flOl) . we observe 
that the natural choice of c k as 

N s T 

E x k + D k J x\ (15) 

. 7=1 ' 

does not lead to the charging profiles (x*, ...,x* N ) that reduce 
the regret of the distribution company to zero. 

We now propose a choice of c k to ensure that when each 
customer minimizes her regret, the aggregated charging profile 
minimizes the distribution company’s regret. 

Proposition III.2. If c k is chosen as 

/I N \ T 

c k (x k ) = (~x k +Y, Xj + D k ) x k , i£ AT, ( 16 ) 

the customers adopt the iteration 0, and T) u = i rji, then the 
average regret of the distribution company as defined in © 
converges to zero as the total number of days goes to infinity. 

To update the charging profile on day k, the 7-th customer 
needs to know 2x k ~ 1 + x k_1 + D k ~ Y or x j ~ 1 + 
D k_1 depending on whether the pricing function (fl5l) or ( 1 1 6b 



K 


-E c *(**)> 


*:=i 


— min L u (x). 


(14) 


is adopted. The distribution company can simply publish the 
total load information for the previous day. The customers do 
not need to have full knowledge about how their consumption 
will map to a corresponding expenditure. 

IV. Extensions 

The basic framework presented above can be extended 
in various directions. They include considering a different 
definition of regret and incorporating customers who vary in 
the elasticity of shifting their EV charging load in response to 
price. 

To ensure that the distribution company’s average regret has 
a convergent behavior and the regret is of the order 0{s/~K), 
in the following discussion we assume that the distribution 
company selects the cost function (ITD for each EV customer 
and sets p u = i, i G Af. 


A. Regret with Respect to the Optimal Charging Profiles 

The regrets defined in © and © measure the difference 
between the performance of the charging profiles generated 
by our algorithm and the performance that is obtained by the 
charging profiles x* and x*, i £ Af that are the solutions 
of the related optimization problems © and ©. respectively. 
We can instead consider the original optimization problems 
© and (0 to define tracking regret after K days as 


K 


RT kmg (K,x k ) :=y2c k u (x k )~ 


k =1 


min 

x k eJ 7 yket c 


K 

E 

k =1 


c k u {x k ), (17) 


where JC := {1 We define the set {x k *, k G JC} as 


K 


x k* G r nt 


k € /C I x k * = arg min E c uC 


k= 1 


(18) 

This notion of tracking regret characterizes the difference 
between the cumulative cost of the charging profiles gener¬ 
ated by our algorithm and the cumulative cost of executing 
the optimal charging profiles that can be calculated only in 
hindsight. For comparison, we refer to the the regrets © and 
© as static regrets. 

Theorem IV. 1. For every x k * £ T, the OMD algorithm yields 
that, 

^tracking ^ 

1 


< 


t]u 

1 

Vu 


L u {h« +1 ) - L u (hi) 
VL u (hZ +1 ) T (x K+1 * - h* +1 ) 


- VLuihlfix 1 * - hi) 

+ -m a x\\\/L u (h k u )\\J2\\x k * - x k+1 *\ 
rj v kef c z ' 

lu k=l 

+ yEH Vc "( a:fc )- M "ll*- 

k =1 


( 19 ) 
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In particular, if r/ u = 0(l/y/~K), then the tracking regret is 
of the order 0(pfK[ 1 + XEr ||£ fc * — xfc+ 1 *ll])- 

A comparison of the regret bounds <o and (fl9l) is of 
interest. If r] u = 0(y/~K), in o. the first term C- P u and 

the second term 1 W^ c u( xk ) ~ M k II* are °f order 

0(y/K). In ( 1 1 9| i, the first three terms measure the difference 
between the initial iterate h\ and the final iterate h^ +1 , the 
difference between the final iterate h^ +1 and the optimal 
solution x K+1 *, and the difference between the iterate hf and 
the optimal solution x 1 *. The first three terms are of order 
0(VK). The last term in Y^k=\ IIVc^(x fc ) — M k \\%, 

also appears in (fl3l i and is of order O ( \/~K ). The fourth term 
increases as K increases and is of order 0{sfK J2k=i ||£ fe * — 
a; fe+1 *||). Because of the presence of the fourth term, the 
regret bound in ( 1 1 9| ) increases as the variation of the optimal 
sequence of decisions Ylrk=i \\ xk * ~ £ fc+1 *|| increases. If the 
optimal solution remains the same from one day to the next, 
then Ylk=i ll a;fe * ~ £ fe+1 *|| = 0 and the tracking regret 
^tracking j s () | Oe orc i er 0{p/K). On the other hand, if 
the optimal solution varies significantly from one day to the 
next, then the tracking regret /jt rackln s w j[| be of the order 
0(VK[1 + J2k= i ll 2 ^* — a;fe+ 1 *ll])’ an ^ average tracking 
regret will not necessarily converge to zero. 

If the distribution company has perfect prediction of the 
gradient of the cost function, i.e., Vc k (x k ) = for all 

k, then the last term in ( 1 19 1 ) vanishes and the distribution 
company can set rj —> oo to ensure that the regret bound 
is zero, i.e., /f" ack, " s < 0. It indicates that the cumulative 
cost function value of the charging profiles x k , k = 1, ...,K 
generated by the online algorithm dTOb and the one generated 
by the elements in the set {x k * £ 1 NT , k £ /C} are identical. 
Note that the elements in the set {x k * £ 'M, NT , k £ 1C} solve 
problem 0 }. 


B. Presence of Inelastic Customers 

The discussion so far assumed that all customers were 
rational in the sense that they wanted to choose their charging 
profile to solve <[3j. Furthermore, they were elastic in schedul¬ 
ing their charging (within the constraints pre-specified by 
*> ow (f), at“ p (f), and Sf). We now assume that some customers 
are either irrational or inelastic and they do not optimize their 
schedules to solve <0. Suppose that Ni out of N customers 
are inelastic. Denote the set of inelastic customers by Mu For 
every inelastic customer i £ Mi, we assume that her charging 
profile remains the same from day to day and is not updated 
to minimize 0. Equivalently, for the inelastic customers, the 
cost function c k can be selected as c k = r, for all k, where r 
is an arbitrary constant. 

Since the inelastic customers do not carry out any pre¬ 
dictions, we set all customers’ predictions to zeros, i.e., 
M k = 0, i £ M , k £ N>o- The update of the charging 
profile for inelastic customer i £ Mi can thus be written as 


where e k is an error term. For instance, for the cost function 
< 00 . the error e k is equal to the total load j x k + D k on the 
k -th day. This error term quantifies the inconsistency between 
the updates as desired by the distribution company for each 
customer to execute and the inelastic customer’s behavior. 
Denote by e k the aggregate error, i.e., e k := (e k ,..., e%). 

Due to the presence of the inelastic customers, the ability 
of the aggregated solution to be valley-filling and hence to 
minimize the cost function in problem 0 is decreased. The 
performance loss is given in the following result. 


Theorem IV.2. Consider that there are N EV customers out 
of which Ni are inelastic customers. If r] u = i £ M, then 
for every x* £ T, the regret of the distribution company is 
bounded as 


Ru{K,x k ) <-P u + r ± i ^ || (VcJO v k ) + e k ) \ 
Vu 0 ^ 


fc=l 


( 21 ) 


K 


izMi 


I Til 


where 


P u := max L u (x) — min L u (x), 
xgj 7 

l|Ti|| := max ||tc - y\\, ||ei|| := max||ef ||, i £ Mi- 

x,y£J-i k 


( 22 ) 


In dm the average regret converges to a constant, i.e., 

limif^oo Ru(K)/K = ll-filllkill- The size of this 

constant depends on the error terms ej, i £ Mi and the 
charging constraints of the inelastic customers. The result ex¬ 
plicitly quantifies the deviation from the desired performance 
in terms of the asymptotic average regret of the distribution 
company. To obtain further insight into the effect of the 
inelastic customers, we proceed as follows. 

Note that the bound (1211 depends on the size of the term 
|| ||. For inelastic customer i £ Mu the error term ||ej|| can 
be bounded as 


|ei|| = max 
k 


. ieAf 


D’ 


< max ^ \\x k 
Vie M 



< 


E 

ieW 


||*" p ||+max||D* 


(23) 


Assume that the Euclidean ball B is the smallest Euclidean 
ball containing the set of feasible charging schedules IF, and r 
is the radius of the ball B. Then, the term | F) | can be further 
bounded by 2r. Since the set T, is a polytope, the algorithm 
in El can be adopted to compute this bound efficiently. 

Notice that if Ni = 0, ( 12 1 l i boils down to (Q0 (without the 
prediction, i.e., M k = 0 for all k = 1,.... K). However, it 
becomes difficult to ensure the feasibility of the problem 0 . 
To improve the ability of the aggregated solution to be valley¬ 
filling in the presence of inelastic customers, the distribution 
company can consider using some loads that can be controlled 
completely. We now discuss this option. 


h k+1 = F7L- 1 (yL l {h k )~r ll {S7c k {x k ) + e k )), 
x k+1 = argmin D Li (xi, h k+1 ), 

x^Ti 


C. Controllable Customers 

( 20 ) On the other end of the spectrum from customers that are 
completely inelastic are customers that are under the complete 
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control of the distribution company. The customers under the 
complete control adopt the charging profiles assigned by the 
company. The charging constraints of these customers are 
also known and controlled by the distribution company. In 
practice, the distribution company can offer a contract to a 
subset of customers offering a special price to be such a 
controllable load. This contract-based direct load control has 
been implemented by many distribution companies, e.g., JT), 
0- 

To introduce such directly controlled customers in our for¬ 
mulation, we allow the charging constraints of the controllable 
customers to be relaxed (thus enlarging the set of feasible 
charging profiles for the controllable customers). Denote the 
set of controllable customers by Af c . On day k, for the 
controllable customer i £ Af c , the set of the relaxed feasible 
charging profiles T, is defined as 

?i~ UeR T \x l r(t)<x^t)<x^(t), 

^ (24) 

teT, Y,aixHt) = Si L 
ter J 

where at = {0,1}, x k is the feasible charging profile in the set 
T, , and x- ow , x/ p , Si, the relaxed minimum charging rate, the 
relaxed maximum charging rate, and the relaxed total charge, 
respectively. The aggregated relaxed charging profile of the 
EV customers is described by a vector x k which consists of 
all elements in the set 

[x k £ R t | i £ Af c } U [x k i £ R t | * £ M, i € Af p }, (25) 

where Af p denotes the set of the price-sensitive customers. 

For controllable customer i £ Af c , we define x* as 

K 

x* := arg min c k (xi). (26) 

We use T to denote the set of the feasible aggregated charging 
profiles including the controllable customers. We define x* as 

K 

x* := argmin c k (x). (27) 

X^lT u i 

k— 1 

There are different ways to relax the set of feasible charging 
profiles for the controllable customers. For example, if we 
select a,i = 0 and .S', = 0, then the equality constraint is 
removed from the set of feasible charging profiles for the i- 
th customer, i £ Af c , namely, the v-th controllable customer 
removes her total charging sum requirement daily. We can also 
relax the charging deadline by adjusting x* ow (f) and x/ p (i). 

By enlarging the set of the feasible charging profiles, for 
controllable customer i £ Af c , the cumulative cost function 
value of the iterates that are generated by the iteration 0 
is a lower bound for the cumulative cost function value of 
the iterates that are obtained by using the same iteration 0 
without relaxation. 

We now propose Algorithm 1 with the above mentioned set 
of relaxed feasible charging schedules T, , % £ Af c to compen¬ 
sate for the performance loss due to the presence of inelastic 


customers. We consider that the controllable customers relax 
their charging constraints for the final J days. 


Algorithm 1 

Require: The distribution company knows Af p , A /j, Af c , 
Ti, i £ Af c , J, and sets rj u = 1/(2 y/K). Each price- 
sensitive customer i £ Af p knows T r , and sets r\i = 
l/y/K. Each inelastic customer i £ A/ knows J/. Each 
controllable customer i £ Af c knows T,, T,, J, and sets 
rji = 1 /\[K 

1: Initialization: k <r- 1, x k (t) t— Si/T, for all % £ Af, t £ T 

2: for k = 1 to K do 

3: At the end of the fc-th day, the distribution company 

gathers the charging profiles x k , i £ Af, computes 
the prices p k {x k ) = ( D k + i e ^ 

notifies the prices to the price-sensitive customers and 
the controllable customers 

4: At the end of the k- th day, every price-sensitive cus¬ 

tomer i £ Afp updates the charging profile by 

h k+1 = h k - T)ip k , 

x k+1 = argmin ||x, - h k+1 \\ 2 . 

5: At the end of the k- th day, every inelastic customer 

* £ Afi updates the charging profile by 


6: At the end of the fc-th day, 

7: if 1 < k < (K — J) then 

8: every controllable customer i £ Af c updates the 

charging profile by 

h\\ +1 = h k - Vl p k , 

x k+1 = argmin \\xi - h k+1 1| 2 . 

9: else 

10: every controllable customer i £ Af c updates the 

charging profile by 

hi fl = h k - p iP k , 

x k+1 = argmin ||x» - h k+1 1| 2 . 

Xi^Fi 


We have the following results to verify that Algorithm 1 is 
a no regret algorithm even with the presence of the inelastic 
customers. 

Theorem IV.3. Consider that there are N EV customers out 
of which Ni are lazy and N c are controllable. Assume that the 
controllable customers relax their charging constraints for the 
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final J days, r) u = ^rp, for all i £ J\f, and the condition 

E [ E W) 

k —1 i£Afi 

+ E [<£(**) - - E(*? - <) T (4) 

k—K— J+l ^ iEA/j 


<o, 

(28) 


holds. Then the average regret of the distribution company is 
bounded as 


Ru(K) J_ 
K ~ K 


K-J 


r p “ + f E |(Vc^(x fc ) + e fc ) 


fc=l 


where 


-^ + T E (VcS(i fc )+e fe ) 


fc=iC-J+l 


P u := max £(x) — min L(x). 


(29) 


(30) 


7zz particular, if = 0(l/v77), f/zezz f/ze average regret 
converges to zero as K —> oo. 


The distribution company needs to design the value J and 
the set Pi, i £ J\f c to ensure that the condition (1281) is satisfied. 
The condition (l28l > imposes requirements on the set of the 
relaxed feasible charging profiles and the number of days to 
relax the charging constraints for the controllable customers. 
If the base load remains the same from day to day and the 
distribution company knows the base load, the value c£(i*) 
and c*(x*), k £ N>o can be computed, for example, using 
the algorithms proposed in (9), iflOl . The distribution company 
is able to verify the condition (l28l > by using the upper bounds 
of ||x*||, ||x*||, and ||e*j|, i £ Mi , k £ N>o as follows. Given 
the upper bounds of \\Xi\\, ||x*||, and ||ef||, i £ Mi, k£ N> 0 , 
the distribution company requires that the value J and the set 
Pi, i £ M c satisfy the inequality 


K 


E 

k=K-J+l 



> E E (iw 

k=K- J+l ieMi V 





(31) 


V. Numerical Examples 

Assume that there are 20 customers. A time slot represent¬ 
ing an interval of 30 minutes is used. There are T = 24 time 
slots. The starting time is set to 8:00 pm. For simplicity, we 
consider that all EV customers charge their EVs from the 9th 
to the 16th time slotsQ. On the first day, the initial charging 
profiles are assumed to be uniformly distributed over the time 
slots. The maximum charging rate is set to x^ p {t) = 2 kW, i £ 

'The problem formulation allows EV customers to have different 
charging constraints. The initial charging time and final charging time mainly 
depend on the preferences of the EV customers. 



Figure 1. Base load profile from 8:00 pm to 7:30 am (the next day). 



Figure 2. Average regrets generated by OMD with and without prediction. 


M and the desired sum S, = 10 kW, i £ M. The simulation 
is carried out for total K = 200 days. We set the parameters 
r]i = 0.0o/ \/~K, i £ M. We first examine the convergence of 
the static regret. The base load profile is given in Figure [1] 
Figure [2] shows the trajectories of the regrets with and without 
the prediction. The prediction Mk, { £ M , k £ N>o is set to 

) = FTT Ek=i,...k-i * G AT. Figure |2] shows 

that the average regrets converge to zero and the average regret 
with the prediction converges faster than the one without the 
prediction. Figure [3] shows the static regrets with and without 
the prediction. Figure |3] shows that the regrets are sublinear 
functions of the number of days, which verifies the results in 
Proposition IIII. II 

We now consider a base load profile which does not remain 
the same from day to day. The base load is assumed to switch 
between two base load profiles (see Figure [4]). We set the 
parameters r] t = 0.005/v77, i £ M. Figure [5] shows the 
trajectories of the regrets with and without the prediction 
given the varying base load in Figure [4] Figure [5] shows that 
the average regrets converge to zero and the average regret 
with the prediction converges faster than the one without the 
prediction. Figure [6] shows the static regrets with and without 
the prediction. Figure [6] shows that the regrets are sublinear 
functions of the number of days. The results in Figure [5] and 
Figure [6] indicate that despite the fact that the base load is 
switching and the distribution company is not aware of this 































Figure 3. Static regrets generated by OMD with and without prediction. 



20:00 22:00 00:30 03:00 05:30 7:30 


Time of Day 

Figure 4. Two different base load profiles. The actual base load is realized 
by switching between the two base load profiles from day to day. 



Figure 5. Average regrets generated by OMD with and without prediction 
given the varying base load profile in Figure [4] 



Figure 6. Static regrets generated by OMD with and without prediction given 
the varying base load profile in Figure [4] 


varying behavior of the base load, our algorithm still provides 
updates of the charging profiles having converging behavior 
of the regret. Average regret converges to zero means that 
in the long term, the average performance of the charging 
schedules x k generated by the OMD algorithm approaches 
the performance obtained by x *, where x* solves ©. 

Figure [ 7 ] provides the total load profiles at convergence with 
various number of inelastic customers. Figure [ 7 ] shows that as 
the total number of inelastic customers increases, the variation 
of the total load profile increases. The result indicates that 
the inelastic customers perturb the valley-filling load profile. 
Now suppose that there are 10 inelastic customers and 10 
customers with relaxed constraints. We consider two different 
relaxation strategies. Relaxation 1 represents the strategy that 
allows the EVs to be charged over the entire time slots (rather 
than merely between the 9th and the 16th time slots), whereas 
Relaxation 2 represents the one extending the charging time 
slots to cover from the 8th to the 17th time slots. Figure [8] 
shows that as the allowed charging time slots are extended, 
the total load profile resembles a valley-filing profile, which 
verifies the effectiveness of the controllable customers to relief 
stress of the power grid. 


VI. Conclusion 

We have designed a framework for distributed charging 
control of EVs using online learning and online convex opti¬ 
mization. The proposed algorithm can be implemented without 
low-latency two-way communication between the distribution 
company and the EV customers, which fits in with the current 
communication infrastructure and protocols in the smart grid. 

Appendix A 

Proof of Proposition [TTL2] 

The update © yields 



x k+1 = argmin rjixJM k+1 + || x t - h k+1 1| 2 , (33) 

where M k is the prediction of the value x\ + D k . 

The update ( ITTTt yields 


\h k+1 ] 


' h\' 


XD k + ZieM^y 

h k+1 

= 

h k 

- rj u 

2(0 fc + £*=^) 

hit 1 . 

NT 

l 

. 

_1 

NT 

2 + 
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Figure 7. The total load profiles with various numbers of inelastic customers, 
the optimal total load profile, and the base load profile. 



Figure 8. The total load profile generated by OMD, the optimal total load 
profile, base load profile, and the total load profile with the relaxations. 


x k+1 = argmin rj u x T M k+1 + ||x— h k+1 1| 2 , (35) 

where h k+1 = [h\ +1 , h l ^~ 1 } T and M k is the prediction of 


XD k + E ie *X) 

2 (D k + J2 t ^x k ) 


X» k + E i^X). 


NT. 


(36) 


By comparing (l34l >. ( l35l > with (l32l >. (l33l ) and substituting 
Vu = \f]ii i G AT, we have that the updates (l35l > and (l33l > are 
identical. Since the updates coincide, the average regret of the 
distribution company converges to zero as k —> oo. 


Appendix B 

Proof of Theorem IIV.1I 

The proof technique that we use to derive the tracking regret 
bound in (IT9]> is similar to the one used in lfl2l Theorem 4]. 
The main step is to bound the difference XiX) - C k {x k *) 
instead of the difference C k (x k ) - C k (x *) that is considered 
in the static regret (fl3l) . However, in lfl2l . the authors derive 
the tracking regret bounds for a different regret minimization 
algorithm rather than OMD. 


Following the proof of 11231 Lemma 2], we have 


c k (x k ) - c k (x k *) < (x k - x k *Y Vc k (x k ) 


(37) 


and 

(x k - x k *) T Vc k u (x k ) < ^\\Vc k u (x k ) - M k \\l 

1 (38) 

+ - (D Lu ( x k * ,h k )~ D Lu ( x k * , h k+1 )). 

T]u 

Furthermore, 

D Lu (x k *,h k J-D Lu (x k *,h k u +1 ) 

= L u (x k *) - L u {h k u ) - VL u (h k ) T (x k * - h k ) 

- L u (x k *) + L u {h k+1 ) + \7L u (h k u +1 ) T (x k * - h k+1 ) 

= L u (h k+1 ) - L u (h k ) + V L u (h k+1 ) T (x k+1 * - h k+1 ) 

- VL u (h k u ) T (x k * - h k ) - VL u (h k+1 ) T (x k+1 * - x k *). 

(39) 

The remainder of the proof is followed by summing over k = 
1,..., K and collecting terms. 


Appendix C 

Proof of Theorem IIV.2I 
Following the proof of ll23l Lemma 2], we have 


and 


where 


c k (x k )-c k u (x*)<(x k -x*) T Vc k u (x k ), 


(x k -xn T Vc k u (x k )<^\\Vc k u (x k )\\l 
+ -(D Lu (x*,h k u )-D Lu (x*,h k+1 )), 

'Hu 


Vc„(x fc ) = 


XD k + EieAT^y 


X^ + Ei^X). 


NT 


(40) 


(41) 


(42) 


For each * £ Mi, the cost function c k is selected as a constant 
function noted in Section IIV-BI The corresponding gradient 
of the lazy customer’s cost function is zero, namely, for lazy 
customer i £ A/j, 



(43) 


where e k = —(D k + EieAf 3 ^). Move e k , i G Mi to the 

right hand side of the inequality in (l4ll) . The remainder of the 
proof is followed by summing over k = 1,..., K and collecting 
terms. 


Appendix D 

Proof of Theorem IIV.3I 
First observe that, for i £ J\fi, 


c k u {x k )-cl{x *) = [4(x fc )-cS(r)] + [4(r)-4(a;*)]. (44) 
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Following the proof of Il23l Lemma 2], 

cUx k )~c k u (x*)<(x k -x*) T ^ u (x k ) 

< yl|V^(i fc )||^ (45) 

+ ±(D Lu (x*,h k u )-D Lu (x*,h k u +1 )). 

iJlL 

The remainder of the proof is followed by summing over 
k = 1,.... K and collecting terms. 
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